ggml-virtgpu: make the code thread safe#19204
Conversation
not necessary
The static init isn't thread safe.
taronaeo
left a comment
There was a problem hiding this comment.
I see that some of the GGML_ABORT statements do not include __func__. Would it be better to include them for debugging and error tracing in the future? :)
|
thanks for the feedback, I followed them.
I'm not sure about this one, that's not a common pattern in the code base, and when I hit an abort, I get automatically get stack trace: so I don't think it's necessary, do you? |
I usually do it for bug reporting purposes so it's easier to identify which line of code is failing, and the order of sequence it happened. But it's just a suggestion, feel free to ignore it if we aren't expecting X specific lines of Edit: Also another thought. Most users are end-consumers who do not compile from source and rather, use a pre-built release binary. IIRC, I may be wrong, release builds do not show full backtrace information on the failing lines of code leading to the abort, or may have just been optimized out; much harder to debug. |
good point, I missed that anyway, I'll think about it, I want to improve the error message when running in an unsupported environment (no virtgpu, this one should be good, but unpatched virglrenderer, this one can be improved I guess) |
|
I've updated the code with the cleaner logging, and I finally followed your suggestion with I also reworked the abort on init, so that it doesn't abort if the virtgpu isn't detected |
taronaeo
left a comment
There was a problem hiding this comment.
Minor formatting changes :)
|
good catches, thanks, fixed as suggested :) |
|
Great! Wait for CI to go green and we merge. |
|
|
* ggml-virtgpu: regenerate_remoting.py: add the ability to deprecate a function * ggml-virtgpu: deprecate buffer_type is_host remoting not necessary * ggml-virtgpu: stop using static vars as cache The static init isn't thread safe. * ggml-virtgpu: protect the use of the shared memory to transfer data * ggml-virtgpu: make the remote calls thread-safe * ggml-virtgpu: backend: don't continue if couldn't allocate the tensor memory * ggml-virtgpu: add a cleanup function for consistency * ggml-virtgpu: backend: don't crash if buft->iface.get_max_size is missing * fix style and ordering * Remove the static variable in apir_device_get_count * ggml-virtgpu: improve the logging * fix review minor formatting changes
This PR improves the code of the ggml-virtgpu backend to make it thread safe, by using mutex for accessing the host<>guest shared memory buffers, and by pre-caching, during the initialization, the constant values queried from the backend.
The unused
buffer_type_is_hostmethod is also deprecated.